Convolutional Attention-based Seq2Seq Neural Network for End-to-End ASR

نویسنده

Dan Lim

چکیده

Traditional approach in artificial intelligence (AI) have been solving the problem that is difficult for human but relatively easy for computer if it could be formulated as mathematical rules or formal languages. However, their symbol, rule-based approach failed in the problem where human being solves intuitively like image recognition, natural language understanding and speech recognition. Therefore the machine learning, which is subfield of AI, have tackled this intuitive problems by making the computer learn from data automatically instead of human efforts of extracting complicated rules. Especially the deep learning which is a particular kind of machine learning as well as central theme of this thesis, have shown great popularity and usefulness recently. It has been known that the powerful computer, large dataset and algorithmic improvement have made recent success of the deep learning. And this factors have enabled recent research to train deeper network achieving significant performance improvement. Those current research trends motivated me to quest deeper architecture for the end-to-end speech recognition. In this thesis, I experimentally showed that the proposed deep neural network achieves state-of-the-art results on ‘TIMIT’ speech recognition benchmark dataset. Specifically, the convolutional attention-based sequence-tosequence model which has the deep stacked convolutional layers in the attention-based seq2seq framework achieved 15.8% phoneme error rate.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) model. We learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) and attention-based encoder-decoder network. The encoder is a deep Convolutional Neural Network (CNN) based on the VGG network. The CTC network sits on top of the encoder and is jointly trained with the attention-base...

متن کامل

Gaussian Prediction Based Attention for Online End-to-End Speech Recognition

Recently end-to-end speech recognition has obtained much attention. One of the popular models to achieve end-to-end speech recognition is attention based encoder-decoder model, which usually generating output sequences iteratively by attending the whole representations of the input sequences. However, predicting outputs until receiving the whole input sequence is not practical for online or low...

متن کامل

Anomaly-based Web Attack Detection: The Application of Deep Neural Network Seq2Seq With Attention Mechanism

Today, the use of the Internet and Internet sites has been an integrated part of the people’s lives, and most activities and important data are in the Internet websites. Thus, attempts to intrude into these websites have grown exponentially. Intrusion detection systems (IDS) of web attacks are an approach to protect users. But, these systems are suffering from such drawbacks as low accuracy in ...

متن کامل

Transfer Learning for Speech Recognition on a Budget

End-to-end training of automated speech recognition (ASR) systems requires massive data and compute resources. We explore transfer learning based on model adaptation as an approach for training ASR models under constrained GPU memory, throughput and training data. We conduct several systematic experiments adapting a Wav2Letter convolutional neural network originally trained for English ASR to t...

متن کامل

LCANet: End-to-End Lipreading with Cascaded Attention-CTC

Machine lipreading is a special type of automatic speech recognition (ASR) which transcribes human speech by visually interpreting the movement of related face regions including lips, face, and tongue. Recently, deep neural network based lipreading methods show great potential and have exceeded the accuracy of experienced human lipreaders in some benchmark datasets. However, lipreading is still...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1710.04515 شماره

صفحات -

تاریخ انتشار 2017

Convolutional Attention-based Seq2Seq Neural Network for End-to-End ASR

نویسنده

چکیده

منابع مشابه

Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

Gaussian Prediction Based Attention for Online End-to-End Speech Recognition

Anomaly-based Web Attack Detection: The Application of Deep Neural Network Seq2Seq With Attention Mechanism

Transfer Learning for Speech Recognition on a Budget

LCANet: End-to-End Lipreading with Cascaded Attention-CTC

عنوان ژورنال:

اشتراک گذاری